{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "    <p style=\"text-align:center\">\n",
    "        <img alt=\"arize langchain pinecone logos\" src=\"https://storage.googleapis.com/arize-phoenix-assets/assets/docs/notebooks/langchain-pinecone-search-and-retrieval/arize_langchain_pinecone.png\" width=\"750\"/>\n",
    "        <br>\n",
    "        <a href=\"https://github.com/Arize-ai/phoenix\">Arize-Phoenix</a>\n",
    "        |\n",
    "        <a href=\"https://github.com/hwchase17/langchain\">LangChain</a>\n",
    "        |\n",
    "        <a href=\"https://www.pinecone.io/\">Pinecone</a>\n",
    "    </p>\n",
    "</center>\n",
    "<h1 align=\"center\">Evaluating and Improving Search and Retrieval Applications</h1>\n",
    "\n",
    "Imagine you're an engineer at Arize AI and you've built and deployed a documentation question-answering service using LangChain and Pinecone. Users send questions about Arize's core product via a chat interface, and your service retrieves chunks of your indexed documentation in order to generate a response to the user. As the engineer in charge of maintaining this system, you want to evaluate the quality of the responses from your service.\n",
    "\n",
    "Phoenix helps you:\n",
    "- identify gaps in your documentation\n",
    "- detect queries for which the LLM gave bad responses\n",
    "- detect failures to retrieve relevant documents\n",
    "\n",
    "In this tutorial, you will:\n",
    "\n",
    "- Ask questions of a LangChain application backed by Pinecone over a knowledge base of the Arize documentation\n",
    "- Use Phoenix to visualize user queries and knowledge base documents to identify areas of user interest not answered by your documentation\n",
    "- Find clusters of responses with negative user feedback\n",
    "- Identify failed retrievals using query density, cosine similarity, query distance, and LLM-assisted ranking metrics\n",
    "\n",
    "⚠️ This notebook assumes you've already indexed the Arize documentation with Pinecone. Run this [script](https://github.com/Arize-ai/phoenix/blob/main/tutorials/build_arize_docs_index_langchain_pinecone.py) to build your own Pinecone index over the Arize documentation. Parts of the notebook require an OpenAI API key and a Pinecone API key to run. If you don't have these API keys, you can still use Phoenix by skipping cells with the 💬 emoji."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chatbot Architecture\n",
    "\n",
    "The architecture of your chatbot is shown below and can be explained in five steps.\n",
    "\n",
    "![chatbot architecture diagram](http://storage.googleapis.com/arize-phoenix-assets/assets/docs/notebooks/langchain-pinecone-search-and-retrieval/langchain_pinecone_openai_chatbot_architecture.png)\n",
    "\n",
    "1. The user sends a query about Arize to your service.\n",
    "1. `langchain.embeddings.OpenAIEmbeddings` makes a request to OpenAI to embed the user query using the text-embedding-ada-002 model.\n",
    "1. `langchain.vectorstores.Pinecone` does a similarity search against the entries of your Pinecone database for the two most similar pieces of context by cosine similarity.\n",
    "1. `langchain.llms.ChatOpenAI` generates a response by formatting the query and retrieved context into a single prompt and sending a request to OpenAI with the gpt-3.5-turbo model.\n",
    "1. The response is returned to the user.\n",
    "\n",
    "Phoenix makes your search and retrieval system observable by capturing the inputs and outputs of these steps for analysis, including:\n",
    "\n",
    "- your query embeddings\n",
    "- the retrieved documents and similarity scores (relevance) to each query\n",
    "- the generated response that is return to the user\n",
    "\n",
    "With that overview in mind, let's dive into the notebook."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Install Dependencies and Import Libraries"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install Phoenix, LangChain, and Pinecone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install \"langchain>=0.0.334\" \"openai>=1\" arize-phoenix pinecone-client"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import textwrap\n",
    "from datetime import timedelta\n",
    "from getpass import getpass\n",
    "from typing import Dict, List, Optional, Tuple\n",
    "\n",
    "import numpy as np\n",
    "import openai\n",
    "import pandas as pd\n",
    "import phoenix as px\n",
    "import pinecone\n",
    "from IPython.display import YouTubeVideo\n",
    "from langchain.chains import RetrievalQA\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.docstore.document import Document\n",
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.vectorstores import Pinecone\n",
    "from tenacity import (\n",
    "    retry,\n",
    "    stop_after_attempt,\n",
    "    wait_random_exponential,\n",
    ")\n",
    "from tqdm import tqdm\n",
    "\n",
    "pd.set_option(\"display.max_colwidth\", None)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Configure Your OpenAI API Key"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "💬 Configure your OpenAI API key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n",
    "    openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n",
    "os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Configure Your Pinecone API Key and Environment"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "💬 Configure your Pinecone API key and environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pinecone_api_key = getpass(prompt=\"🔑 Enter your Pinecone API key: \")\n",
    "pinecone_environment = getpass(prompt=\"set your Pinecone environment\")\n",
    "pinecone_index_name = getpass(prompt=\"set your Pinecone index name\")\n",
    "pinecone.init(api_key=pinecone_api_key, environment=pinecone_environment)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Run Your Question-Answering Service"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "💬 Define a wrapper class around `lanchain.embeddings.openai.OpenAIEmbeddings` to record query and document embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class OpenAIEmbeddingsWrapper(OpenAIEmbeddings):\n",
    "    \"\"\"\n",
    "    A wrapper around OpenAIEmbeddings that stores the query and document\n",
    "    embeddings.\n",
    "    \"\"\"\n",
    "\n",
    "    query_text_to_embedding: Dict[str, List[float]] = {}\n",
    "    document_text_to_embedding: Dict[str, List[float]] = {}\n",
    "\n",
    "    def embed_query(self, text: str) -> List[float]:\n",
    "        embedding = super().embed_query(text)\n",
    "        self.query_text_to_embedding[text] = embedding\n",
    "        return embedding\n",
    "\n",
    "    def embed_documents(self, texts: List[str], chunk_size: Optional[int] = 0) -> List[List[float]]:\n",
    "        embeddings = super().embed_documents(texts, chunk_size)\n",
    "        for text, embedding in zip(texts, embeddings):\n",
    "            self.document_text_to_embedding[text] = embedding\n",
    "        return embeddings\n",
    "\n",
    "    @property\n",
    "    def query_embedding_dataframe(self) -> pd.DataFrame:\n",
    "        return self._convert_text_to_embedding_map_to_dataframe(self.query_text_to_embedding)\n",
    "\n",
    "    @property\n",
    "    def document_embedding_dataframe(self) -> pd.DataFrame:\n",
    "        return self._convert_text_to_embedding_map_to_dataframe(self.document_text_to_embedding)\n",
    "\n",
    "    @staticmethod\n",
    "    def _convert_text_to_embedding_map_to_dataframe(\n",
    "        text_to_embedding: Dict[str, List[float]],\n",
    "    ) -> pd.DataFrame:\n",
    "        texts, embeddings = map(list, zip(*text_to_embedding.items()))\n",
    "        embedding_arrays = [np.array(embedding) for embedding in embeddings]\n",
    "        return pd.DataFrame.from_dict(\n",
    "            {\n",
    "                \"text\": texts,\n",
    "                \"text_vector\": embedding_arrays,\n",
    "            }\n",
    "        )"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "💬 Define a wrapper around `langchain.vectorstores.Pinecone` to record retrieval data, including the retrieved pieces of context for each query and the associated similarity scores between query and context embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class PineconeWrapper(Pinecone):\n",
    "    query_text_to_document_score_tuples: Dict[str, List[Tuple[Document, float]]] = {}\n",
    "\n",
    "    def similarity_search_with_score(\n",
    "        self,\n",
    "        query: str,\n",
    "        k: int = 4,\n",
    "        filter: Optional[dict] = None,\n",
    "        namespace: Optional[str] = None,\n",
    "    ) -> List[Tuple[Document, float]]:\n",
    "        document_score_tuples = super().similarity_search_with_score(\n",
    "            query=query,\n",
    "            k=k,\n",
    "            filter=filter,\n",
    "            namespace=namespace,\n",
    "        )\n",
    "        self.query_text_to_document_score_tuples[query] = document_score_tuples\n",
    "        return document_score_tuples\n",
    "\n",
    "    @property\n",
    "    def retrieval_dataframe(self) -> pd.DataFrame:\n",
    "        query_texts = []\n",
    "        document_texts = []\n",
    "        retrieval_ranks = []\n",
    "        scores = []\n",
    "        for query_text, document_score_tuples in self.query_text_to_document_score_tuples.items():\n",
    "            for retrieval_rank, (document, score) in enumerate(document_score_tuples):\n",
    "                query_texts.append(query_text)\n",
    "                document_texts.append(document.page_content)\n",
    "                retrieval_ranks.append(retrieval_rank)\n",
    "                scores.append(score)\n",
    "        return pd.DataFrame.from_dict(\n",
    "            {\n",
    "                \"query_text\": query_texts,\n",
    "                \"document_text\": document_texts,\n",
    "                \"retrieval_rank\": retrieval_ranks,\n",
    "                \"score\": scores,\n",
    "            }\n",
    "        )"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "💬 Define an instance of `langchain.chains.RetrievalQA` using `OpenAIEmbeddingsWrapper` and `PineconeWrapper`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "embedding_model_name = \"text-embedding-ada-002\"\n",
    "num_retrieved_documents = 2\n",
    "embeddings = OpenAIEmbeddingsWrapper(model=embedding_model_name)\n",
    "docsearch = PineconeWrapper.from_existing_index(\n",
    "    index_name=pinecone_index_name,\n",
    "    embedding=embeddings,\n",
    ")\n",
    "chat_model_name = \"gpt-3.5-turbo\"\n",
    "llm = ChatOpenAI(model_name=chat_model_name)\n",
    "chain = RetrievalQA.from_chain_type(\n",
    "    llm=llm,\n",
    "    chain_type=\"stuff\",\n",
    "    retriever=docsearch.as_retriever(search_kwargs={\"k\": num_retrieved_documents}),\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "💬 Run a query and display the results, including the final response from the chatbot, the retrieved pieces of context and associated similarity scores, and a portion of the query embedding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_text = \"How do I get an Arize API key?\"\n",
    "# query_text = \"What's the difference between primary and baseline datasets?\"\n",
    "# query_text = \"How do I send in extra metadata with each record?\"\n",
    "# query_text = \"How does Arize's surrogate explainability model work?\"\n",
    "\n",
    "response_text = chain.run(query_text)\n",
    "retrievals_df = docsearch.retrieval_dataframe.tail(num_retrieved_documents)\n",
    "contexts = retrievals_df[\"document_text\"].to_list()\n",
    "scores = retrievals_df[\"score\"].to_list()\n",
    "query_embedding = embeddings.query_embedding_dataframe[\"text_vector\"].iloc[-1]\n",
    "\n",
    "print(\"Response\")\n",
    "print(\"========\")\n",
    "print()\n",
    "for line in textwrap.wrap(response_text.strip(), width=80):\n",
    "    print(line)\n",
    "print()\n",
    "\n",
    "for context_index, (context, score) in enumerate(zip(contexts, scores)):\n",
    "    print(f\"Retrieved Context {context_index}\")\n",
    "    print(\"===================\")\n",
    "    print()\n",
    "\n",
    "    for line in textwrap.wrap(context.strip(), width=80):\n",
    "        print(line)\n",
    "    print()\n",
    "    print(f\"score: {score}\")\n",
    "    print()\n",
    "\n",
    "print(\"Query Embedding\")\n",
    "print(\"===============\")\n",
    "print()\n",
    "\n",
    "print(query_embedding)\n",
    "print()\n",
    "print(f\"dimension: {len(query_embedding)}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Load Your Data Into Pandas Dataframes"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use Phoenix, you must load your data into Pandas dataframes. First, load your knowledge base into a dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "database_df = pd.read_parquet(\n",
    "    \"http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/langchain-pinecone/database.parquet\"\n",
    ")\n",
    "database_df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The columns of your dataframe are:\n",
    "- **text:** the chunked text in your knowledge base\n",
    "- **text_vector:** the embedding vector for the text, computed during the LangChain build using the \"text-embedding-ada-002\" embedding model from OpenAI"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, download a dataframe containing query data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_df = pd.read_parquet(\n",
    "    \"http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/langchain-pinecone/langchain_pinecone_query_dataframe_with_user_feedbackv2.parquet\"\n",
    ")\n",
    "query_df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The columns of the dataframe are:\n",
    "- **text:** the query text\n",
    "- **text_vector:** the embedding representation of the query, captured from LangChain at query time\n",
    "- **response:** the final response from the LangChain application\n",
    "- **context_text_0:** the first retrieved context from the knowledge base\n",
    "- **context_similarity_0:** the cosine similarity between the query and the first retrieved context\n",
    "- **context_text_1:** the second retrieved context from the knowledge base\n",
    "- **context_similarity_1:** the cosine similarity between the query and the first retrieved context\n",
    "- **user_feedback:** approval or rejection from the user (-1 means thumbs down, +1 means thumbs up)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The query and database datasets are drawn from different distributions; the queries are short questions while the database entries are several sentences to a paragraph. The embeddings from OpenAI's \"text-embedding-ada-002\" capture these differences and naturally separate the query and context embeddings into distinct regions of the embedding space. When using Phoenix, you want to \"overlay\" the query and context embedding distributions so that queries appear close to their retrieved context in the Phoenix point cloud. To achieve this, we compute a centroid for each dataset that represents an average point in the embedding distribution and center the two distributions so they overlap."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "database_centroid = database_df[\"text_vector\"].mean()\n",
    "database_df[\"centered_text_vector\"] = database_df[\"text_vector\"].apply(\n",
    "    lambda x: x - database_centroid\n",
    ")\n",
    "query_centroid = query_df[\"text_vector\"].mean()\n",
    "query_df[\"centered_text_vector\"] = query_df[\"text_vector\"].apply(lambda x: x - query_centroid)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Run LLM-Assisted Evaluations"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cosine similarity and Euclidean distance are reasonable proxies for retrieval quality, but they don't always work perfectly. A novel idea is to use LLMs to measure retrieval quality by simply asking the LLM whether each piece of retrieved context is relevant or irrelevant to the corresponding query.\n",
    "\n",
    "⚠️ It's strongly recommended to use GPT-4 for this step if you have access, since we've found it to be more trustworthy for this particular task."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "💬 Use OpenAI to predict whether each retrieved document is relevant or irrelevant to the query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "EVALUATION_SYSTEM_MESSAGE = (\n",
    "    \"You will be given a query and a reference text. \"\n",
    "    \"You must determine whether the reference text contains an answer to the input query. \"\n",
    "    \"Your response must be binary (0 or 1) and \"\n",
    "    \"should not contain any text or characters aside from 0 or 1. \"\n",
    "    \"0 means that the reference text does not contain an answer to the query. \"\n",
    "    \"1 means the reference text contains an answer to the query.\"\n",
    ")\n",
    "QUERY_CONTEXT_PROMPT_TEMPLATE = \"\"\"# Query: {query}\n",
    "\n",
    "# Reference: {reference}\n",
    "\n",
    "# Binary: \"\"\"\n",
    "\n",
    "\n",
    "@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))\n",
    "def evaluate_query_and_retrieved_context(query: str, context: str, model_name: str) -> str:\n",
    "    prompt = QUERY_CONTEXT_PROMPT_TEMPLATE.format(\n",
    "        query=query,\n",
    "        reference=context,\n",
    "    )\n",
    "    response = openai.ChatCompletion.create(\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": EVALUATION_SYSTEM_MESSAGE},\n",
    "            {\"role\": \"user\", \"content\": prompt},\n",
    "        ],\n",
    "        model=model_name,\n",
    "    )\n",
    "    return response[\"choices\"][0][\"message\"][\"content\"]\n",
    "\n",
    "\n",
    "def evaluate_retrievals(\n",
    "    retrievals_data: Dict[str, str],\n",
    "    model_name: str,\n",
    ") -> List[str]:\n",
    "    responses = []\n",
    "    for query, retrieved_context in tqdm(retrievals_data.items()):\n",
    "        response = evaluate_query_and_retrieved_context(query, retrieved_context, model_name)\n",
    "        responses.append(response)\n",
    "    return responses\n",
    "\n",
    "\n",
    "def process_binary_responses(\n",
    "    binary_responses: List[str], binary_to_string_map: Dict[int, str]\n",
    ") -> List[str]:\n",
    "    \"\"\"\n",
    "    Parse binary responses and convert to the desired format\n",
    "    converts them to the desired format. The binary_to_string_map parameter\n",
    "    should be a dictionary mapping binary values (0 or 1) to the desired\n",
    "    string values (e.g. \"irrelevant\" or \"relevant\").\n",
    "    \"\"\"\n",
    "    processed_responses = []\n",
    "    for binary_response in binary_responses:\n",
    "        try:\n",
    "            binary_value = int(binary_response.strip())\n",
    "            processed_response = binary_to_string_map[binary_value]\n",
    "        except (ValueError, KeyError):\n",
    "            processed_response = None\n",
    "        processed_responses.append(processed_response)\n",
    "    return processed_responses\n",
    "\n",
    "\n",
    "sample_query_df = query_df.head(10).copy()\n",
    "evaluation_model_name = \"gpt-3.5-turbo\"  # use GPT-4 if you have access\n",
    "for context_index in range(num_retrieved_documents):\n",
    "    retrievals_data = {\n",
    "        row[\"text\"]: row[f\"context_text_{context_index}\"] for _, row in sample_query_df.iterrows()\n",
    "    }\n",
    "    raw_responses = evaluate_retrievals(retrievals_data, evaluation_model_name)\n",
    "    processed_responses = process_binary_responses(raw_responses, {0: \"irrelevant\", 1: \"relevant\"})\n",
    "    sample_query_df[f\"openai_relevance_{context_index}\"] = processed_responses\n",
    "sample_query_df[\n",
    "    [\"text\", \"context_text_0\", \"openai_relevance_0\", \"context_text_1\", \"openai_relevance_1\"]\n",
    "].head(10)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running evaluations across the entire dataset takes a while, so download a dataset of pre-computed evaluations and add to the query dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "openai_evaluations_df = pd.read_parquet(\n",
    "    \"http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/langchain-pinecone/langchain_pinecone_openai_evaluations.parquet\",\n",
    ")[[\"text\", \"openai_relevance_0\", \"openai_relevance_1\"]]\n",
    "query_df = pd.merge(query_df, openai_evaluations_df, on=\"text\")\n",
    "query_df[[\"text\", \"context_text_0\", \"context_text_1\", \"openai_relevance_0\", \"openai_relevance_1\"]]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Compute Ranking Metrics"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you know whether each piece of retrieved context is relevant or irrelevant to the corresponding query, you can compute precision@k for k = 1, 2 for each query. This metric tells you what percentage of the retrieved context is relevant to the corresponding query.\n",
    "\n",
    "precision@k = (# of top-k retrieved documents that are relevant) / (k retrieved documents)\n",
    "\n",
    "If your precision@2 is greater than zero for a particular query, your LangChain application successfully retrieved at least one relevant piece of context with which to answer the query. If the precision@k is zero for a particular query, that means that no relevant piece of context was retrieved."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute precision@k for k = 1, 2 and view the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_relevant_documents_array = np.zeros(len(query_df))\n",
    "num_retrieved_documents = 2\n",
    "for retrieved_document_index in range(0, num_retrieved_documents):\n",
    "    num_retrieved_documents = retrieved_document_index + 1\n",
    "    num_relevant_documents_array += (\n",
    "        query_df[f\"openai_relevance_{retrieved_document_index}\"]\n",
    "        .map(lambda x: int(x == \"relevant\"))\n",
    "        .to_numpy()\n",
    "    )\n",
    "    query_df[f\"openai_precision@{num_retrieved_documents}\"] = pd.Series(\n",
    "        num_relevant_documents_array / num_retrieved_documents\n",
    "    )\n",
    "\n",
    "query_df[\n",
    "    [\n",
    "        \"openai_relevance_0\",\n",
    "        \"openai_relevance_1\",\n",
    "        \"openai_precision@1\",\n",
    "        \"openai_precision@2\",\n",
    "    ]\n",
    "]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Launch Phoenix"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define a schema to tell Phoenix what the columns of your query and database dataframes represent (features, predictions, actuals, tags, embeddings, etc.). See the [docs](https://docs.arize.com/phoenix/) for guides on how to define your own schema and API reference on `phoenix.Schema` and `phoenix.EmbeddingColumnNames`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_schema = px.Schema(\n",
    "    prompt_column_names=px.EmbeddingColumnNames(\n",
    "        raw_data_column_name=\"text\",\n",
    "        vector_column_name=\"centered_text_vector\",\n",
    "    ),\n",
    "    response_column_names=\"response\",\n",
    "    tag_column_names=[\n",
    "        \"context_text_0\",\n",
    "        \"context_similarity_0\",\n",
    "        \"context_text_1\",\n",
    "        \"context_similarity_1\",\n",
    "        \"euclidean_distance_0\",\n",
    "        \"euclidean_distance_1\",\n",
    "        \"openai_relevance_0\",\n",
    "        \"openai_relevance_1\",\n",
    "        \"openai_precision@1\",\n",
    "        \"openai_precision@2\",\n",
    "        \"user_feedback\",\n",
    "    ],\n",
    ")\n",
    "database_schema = px.Schema(\n",
    "    document_column_names=px.EmbeddingColumnNames(\n",
    "        raw_data_column_name=\"text\",\n",
    "        vector_column_name=\"centered_text_vector\",\n",
    "    ),\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create Phoenix datasets that wrap your dataframes with the schemas that describe them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "database_ds = px.Dataset(\n",
    "    dataframe=database_df,\n",
    "    schema=database_schema,\n",
    "    name=\"pinecone\",\n",
    ")\n",
    "query_ds = px.Dataset(\n",
    "    dataframe=query_df,\n",
    "    schema=query_schema,\n",
    "    name=\"query\",\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Launch Phoenix. Note that the database dataset is passed in as the `corpus` - a reserved keyword argument for passing in the knowledge base for search and retrieval analysis. Follow the instructions in the cell output to open the Phoenix UI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = px.launch_app(query_ds, corpus=database_ds)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Surface Problematic Clusters and Data Points\n",
    "\n",
    "Phoenix helps you:\n",
    "\n",
    "- visualize your embeddings\n",
    "- color the resulting point cloud using evaluation metrics\n",
    "- cluster the points and surface up problematic clusters based on whatever metric you care about\n",
    "\n",
    "Follow along with the tutorial walkthrough [here](https://youtu.be/eDW1EsXMNY4?t=1850), or view the video in your notebook by running the cell below. The video will show you how to investigate your query and knowledge base and identify problematic clusters of data points using Phoenix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "start_time_in_seconds = int(timedelta(hours=0, minutes=30, seconds=50).total_seconds())\n",
    "YouTubeVideo(\"eDW1EsXMNY4\", start=start_time_in_seconds, width=560, height=315)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Congrats! You've identified a problematic cluster of queries. You now have tools at your disposal to investigate clusters of queries where your search and retrieval application is performing poorly based on:\n",
    "\n",
    "- query percentage\n",
    "- user feedback\n",
    "- LLM-assisted ranking metrics\n",
    "\n",
    "As an actionable next step, you should augment your knowledge base to include information about the pricing and cost of the Arize platform, since your users seem especially interested in this topic."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
